Text segmentation by language

نویسندگان

چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text Segmentation by Language Using Minimum Description Length

The problem addressed in this paper is to segment a given multilingual document into segments for each language and then identify the language of each segment. The problem was motivated by an attempt to collect a large amount of linguistic data for non-major languages from the web. The problem is formulated in terms of obtaining the minimum description length of a text, and the proposed solutio...

متن کامل

Multiple text segmentation for statistical language modeling

In this article we deal with the text segmentation problem in statistical language modeling for under-resourced languages with a writing system without word boundary delimiters. While the lack of text resources has a negative impact on the performance of language models, the errors introduced by the automatic word segmentation makes those data even less usable. To better exploit the text resour...

متن کامل

Text Segmentation by

We investigate the problem of text segmentation by topic. Applications for this task include topic tracking of broadcast speech data and topic identiication in full-text databases. Researchers have tackled similar problems before but with diierent goals. This study focuses on data with relatively small segment sizes and for which within-segment sentences have relatively few words in common maki...

متن کامل

Unsupervised Text Segmentation Based on Native Language Characteristics

Most work on segmenting text does so on the basis of topic changes, but it can be of interest to segment by other, stylistically expressed characteristics such as change of authorship or native language. We propose a Bayesian unsupervised text segmentation approach to the latter. While baseline models achieve essentially random segmentation on our task, indicating its difficulty, a Bayesian mod...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Sistemas y Telemática

سال: 2016

ISSN: 1692-5238

DOI: 10.18046/syt.v14i38.2289